Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Long to Wide for multiple time varying covariates*



    Hi all, I am trying to convert my data from long format to wide. Each patient (ID), has multiple ECGs, multiple echocardiograms and pacemaker checks at different times points. Below is a simplified version of my database. DIAGNOSISDATE, is when a patient (ID) was diagnosed with heart disease. Then had multiple tests done during follow up. I have the DATE of each test, time from diagnosis for each test (in days) and a variable attributable to each test. For instance, for each ECHODATE I measured patients heart volume, and for each PACEMAKER check date I recorded whether it was abnormal or not, etc. Now all these time varying variables have different dates. I assume the value for my measurements does not change in between my measurements so I should be able to carry forward prior observations. I thought if I create a variable that would include ALL PROCEDURE DATES (ECG, ECHO, PACEMAKER), then I can sort by ID and for every mission row carry forward the prior observation for each variable, but I cannot seem to find a way to do that. Is there a better way to do this before reshaping the data from Long to Wide? Sorry if this is a very basic question. Thank you for your help.


    ID DIAGNOSISDATE ECHODATE_Dx ECHODATE HEARTVOL PACEDATE_Dx PACEDATE ABNLPACE ECGDATE-dx ECGDATE ABNLECG
    2 4/27/09 4 5/1/09 22.46 1289 11/6/12 0.00 24.00 5/21/2009 1
    2 4/27/09 1289 11/6/12 25.46 1456 4/22/13 1.00 441.00 7/12/2010 1
    2 4/27/09 2065 12/22/14 26.82 2065 12/22/14 1.00 756.00 5/23/2011 1
    2 4/27/09 2464 1/25/16 27.66 3059 9/11/17 1.00 1289.00 11/6/2012 1
    2 4/27/09 3213 2/12/18 28.83 3213 2/12/18 0.00 1456.00 4/22/2013 1
    2 4/27/09 3724 7/8/19 27.00 3724 7/8/19 0.00 2065.00 12/22/2014 1
    2 4/27/09 2464.00 1/25/2016 1
    2 4/27/09 2527.00 3/28/2016 1
    2 4/27/09 3059.00 9/11/2017 1
    2 4/27/09 3216.00 2/15/2018 1
    2 4/27/09 3724.00 7/8/2019 1
    1 1/8/07 21 1/29/07 18.89 1764 11/7/11 0.00 217.00 8/13/07 1
    1 1/8/07 1291 7/22/10 19.25 1852 2/3/12 0.00 469.00 4/21/08 1
    1 1/8/07 1680 8/15/11 23.34 2072 9/10/12 0.00 903.00 6/29/09 1
    1 1/8/07 2111 10/19/12 22.13 2086 9/24/12 0.00 1107.00 1/19/10 1
    1 1/8/07 2283 4/9/13 21.15 2111 10/19/12 0.00 1568.00 4/25/11 1
    1 1/8/07 3003 3/30/15 23.41 2123 10/31/12 0.00 2122.00 10/30/12 1
    1 1/8/07 3934 10/16/17 19.50 2254 3/11/13 0.00 2312.00 5/8/13 1
    1 1/8/07 2282 4/8/13 12.00 3003.00 3/30/15 1
    1 1/8/07 2312 5/8/13 0.00 3934.00 10/16/17 1
    1 1/8/07 2327 5/23/13 0.00
    1 1/8/07 2416 8/20/13 0.00
    1 1/8/07 3934 10/16/17 0.00







  • #2
    The above did not come from a Stata data set. It could not have, because hyphens are not legal characters in variable names. If you have not yet imported your data into Stata, then it is premature to ask for help with coding it. If you have, then you should post back with an excerpt of your data that comes from that data set, and you should do it using the -dataex- command. If you are running version 16 or a fully updated version 15.1 or 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.


    Comment


    • #3
      Thank you so much for explaining and my apologies for posting the tabular form. I am using Stata/SE 16.0. Would greatly appreciate your help.

      Code:
      * Example generated by -dataex-. To install: ssc install dataex
      clear
      input byte ID int(DIAGNOSISDATE TTEDATE_Dx ECHODATE) double HEARTVOL int(PACEDIAG PACEDATE) byte ABNLPACE int ECGdatedx str10 ECGDate byte ABNLECG
      2 18014    4 18018 22.46 1289 19303  0   24 "5/21/2009"  1
      2 18014 1289 19303 25.46 1456 19470  1  441 "7/12/2010"  1
      2 18014 2065 20079 26.82 2065 20079  1  756 "5/23/2011"  1
      2 18014 2464 20478 27.66 3059 21073  1 1289 "11/6/2012"  1
      2 18014 3213 21227 28.83 3213 21227  0 1456 "4/22/2013"  1
      2 18014 3724 21738    27 3724 21738  0 2065 "12/22/2014" 1
      2 18014    .     .     .    .     .  . 2464 "1/25/2016"  1
      2 18014    .     .     .    .     .  . 2527 "3/28/2016"  1
      2 18014    .     .     .    .     .  . 3059 "9/11/2017"  1
      2 18014    .     .     .    .     .  . 3216 "2/15/2018"  1
      2 18014    .     .     .    .     .  . 3724 "7/8/2019"   1
      1 17174   21 17195 18.89 1764 18938  0  217 " 8/13/2007" 1
      1 17174 1291 18465 19.25 1852 19026  0  469 " 4/21/2008" 1
      1 17174 1680 18854 23.34 2072 19246  0  903 " 6/29/2009" 1
      1 17174 2111 19285 22.13 2086 19260  0 1107 " 1/19/2010" 1
      1 17174 2283 19457 21.15 2111 19285  0 1568 " 4/25/2011" 1
      1 17174 3003 20177 23.41 2123 19297  0 2122 "10/30/2012" 1
      1 17174 3934 21108  19.5 2254 19428  0 2312 "  5/8/2013" 1
      1 17174    .     .     . 2282 19456 12 3003 " 3/30/2015" 1
      1 17174    .     .     . 2312 19486  0 3934 "10/16/2017" 1
      1 17174    .     .     . 2327 19501  0    . ""           .
      1 17174    .     .     . 2416 19590  0    . ""           .
      1 17174    .     .     . 3934 21108  0    . ""           .
      end
      format %tdnn/dd/CCYY DIAGNOSISDATE
      format %tdnn/dd/CCYY ECHODATE
      format %tdnn/dd/CCYY PACEDATE

      Comment


      • #4
        The last date is string. You will need to transform it in a date variable. That said, I see blank spaces in the beginning of some observations, which would prompt to add some trimming beforehand.
        Best regards,

        Marcos

        Comment


        • #5
          Thank you, does this make more sense?

          Code:
          * Example generated by -dataex-. To install: ssc install dataex
          clear
          input byte ID int(DIAGNOSISDATE TTEDATE_Dx ECHODATE) double HEARTVOL int(PACEDIAG PACEDATE) byte ABNLPACE int(ECGdatedx ECGDate) byte ABNLECG
          2 18014    4 18018 22.46 1289 19303  0   24 18038 1
          2 18014 1289 19303 25.46 1456 19470  1  441 18455 1
          2 18014 2065 20079 26.82 2065 20079  1  756 18770 1
          2 18014 2464 20478 27.66 3059 21073  1 1289 19303 1
          2 18014 3213 21227 28.83 3213 21227  0 1456 19470 1
          2 18014 3724 21738    27 3724 21738  0 2065 20079 1
          2 18014    .     .     .    .     .  . 2464 20478 1
          2 18014    .     .     .    .     .  . 2527 20541 1
          2 18014    .     .     .    .     .  . 3059 21073 1
          2 18014    .     .     .    .     .  . 3216 21230 1
          2 18014    .     .     .    .     .  . 3724 21738 1
          1 17174   21 17195 18.89 1764 18938  0  217 17391 1
          1 17174 1291 18465 19.25 1852 19026  0  469 17643 1
          1 17174 1680 18854 23.34 2072 19246  0  903 18077 1
          1 17174 2111 19285 22.13 2086 19260  0 1107 18281 1
          1 17174 2283 19457 21.15 2111 19285  0 1568 18742 1
          1 17174 3003 20177 23.41 2123 19297  0 2122 19296 1
          1 17174 3934 21108  19.5 2254 19428  0 2312 19486 1
          1 17174    .     .     . 2282 19456 12 3003 20177 1
          1 17174    .     .     . 2312 19486  0 3934 21108 1
          1 17174    .     .     . 2327 19501  0    .     . .
          1 17174    .     .     . 2416 19590  0    .     . .
          1 17174    .     .     . 3934 21108  0    .     . .
          end
          format %tdnn/dd/CCYY DIAGNOSISDATE
          format %tdnn/dd/CCYY ECHODATE
          format %tdnn/dd/CCYY PACEDATE
          format %tdnn/dd/CCYY ECGDate

          here is a full description of the data

          obs: 23
          vars: 11 30 Sep 2019 11:19
          --------------------------------------------------------------------------------
          storage display value
          variable name type format label variable label
          --------------------------------------------------------------------------------
          ID byte %10.0g ID
          DIAGNOSISDATE int %td.. DIAGNOSISDATE
          TTEDATE_Dx int %10.0g TTEDATE_Dx
          ECHODATE int %td.. ECHODATE
          HEARTVOL double %14.2f HEARTVOL
          PACEDIAG int %10.0g PACEDIAG
          PACEDATE int %td.. PACEDATE
          ABNLPACE byte %14.2f ABNLPACE
          ECGdatedx int %14.2f ECG-date-dx
          ECGDate int %td.. ECG-Date
          ABNLECG byte %10.0g ABNLECG
          --

          Comment


          • #6
            I've been working on this for a while and it's very confusing. What is going on with DIAGNOSISDATE and TTEDATE_Dx? It seems that the same patient can have multiple observations with the same DIAGNOSISDATE, and different TTEDATE_Dx values associated with them. That does not happen for the ECHO, PACE, or ECG variables: there we find only one observation of these for any given person on a given date. I have code that would work were it not for DIAGNOSISDATE and TTEDATE_Dx, but they break the code, and I don't understand how they would even fit into the final results you are looking for, so I am stymied at this point. Please show what the results you want to get would look like for, say, ID 1.

            Comment


            • #7
              Thank you so much for writing back and for trying. I understand how this can be confusing.

              So basically there are two patients in the dataset. ID 1 and 2 each patient was diagnosed with a special heart disease at a certain date (Diagnosis date). That's when our disease starts, then had several follow-up tests ECHO, PACE, and ECG at different time points (hence multiple rows). TTE means echo (sorry for not clarifying that) so TTEDATE-Dx is time from diagnosis of heart disease to the date echo was done in "days" and the same applies to PACEDIAG and ECGDATE-Dx (these are time from diagnosis to pacemaker check and time from diagnosis to ECGdate).

              So before even transforming data from long to wide. I am thinking I need to create a balanced dataset. Number of rows for each patient will be number of unique procedure dates. For example if this dataset only included patient 2 with the following ECHOdates, ECG dates and pace dates:
              ECHODATE
              5/1/09
              11/6/12
              12/22/14
              1/25/16
              2/12/18
              7/8/19
              PACEDATE
              11/6/12
              4/22/13
              12/22/14
              9/11/17
              2/12/18
              7/8/19
              ECG-Date
              5/21/09
              7/12/10
              5/23/11
              11/6/12
              4/22/13
              12/22/14
              1/25/16
              3/28/16
              9/11/17
              2/15/18
              7/8/19

              then I would have 13 unique follow up dates. the dataset would need 13 rows, and each row will represent a follow up date, then on the dates that echo was done we have the actual echo measurement (i.e. Heartvolume); and the days we have ECGs done, we can carry forward the previous echo measurement (HEARTVOL) and so on.
              All follow up dates
              5/1/09
              5/21/09
              7/12/10
              5/23/11
              11/6/12
              4/22/13
              12/22/14
              1/25/16
              3/28/16
              9/11/17
              2/12/18
              2/15/18
              7/8/19

              Then once we do this I believe I can reshape data from long to wide. Does this make sense? Is there a better way to do this?









              .

              Comment


              • #8
                Now I get it. The code below does what you want. I had to rename your variables so that they were more systematic and amenable to -reshape-ing. The decision to make them all lower case was just for my convenience, as that is the style I'm accustomed to. You don't need to do that part, although you will need to change the code to use mixed and upper case names if you don't do it.

                Code:
                * Example generated by -dataex-. To install: ssc install dataex
                clear
                input byte ID int(DIAGNOSISDATE TTEDATE_Dx ECHODATE) double HEARTVOL int(PACEDIAG PACEDATE) byte ABNLPACE int(ECGdatedx ECGDate) byte ABNLECG
                2 18014    4 18018 22.46 1289 19303  0   24 18038 1
                2 18014 1289 19303 25.46 1456 19470  1  441 18455 1
                2 18014 2065 20079 26.82 2065 20079  1  756 18770 1
                2 18014 2464 20478 27.66 3059 21073  1 1289 19303 1
                2 18014 3213 21227 28.83 3213 21227  0 1456 19470 1
                2 18014 3724 21738    27 3724 21738  0 2065 20079 1
                2 18014    .     .     .    .     .  . 2464 20478 1
                2 18014    .     .     .    .     .  . 2527 20541 1
                2 18014    .     .     .    .     .  . 3059 21073 1
                2 18014    .     .     .    .     .  . 3216 21230 1
                2 18014    .     .     .    .     .  . 3724 21738 1
                1 17174   21 17195 18.89 1764 18938  0  217 17391 1
                1 17174 1291 18465 19.25 1852 19026  0  469 17643 1
                1 17174 1680 18854 23.34 2072 19246  0  903 18077 1
                1 17174 2111 19285 22.13 2086 19260  0 1107 18281 1
                1 17174 2283 19457 21.15 2111 19285  0 1568 18742 1
                1 17174 3003 20177 23.41 2123 19297  0 2122 19296 1
                1 17174 3934 21108  19.5 2254 19428  0 2312 19486 1
                1 17174    .     .     . 2282 19456 12 3003 20177 1
                1 17174    .     .     . 2312 19486  0 3934 21108 1
                1 17174    .     .     . 2327 19501  0    .     . .
                1 17174    .     .     . 2416 19590  0    .     . .
                1 17174    .     .     . 3934 21108  0    .     . .
                end
                format %tdnn/dd/CCYY DIAGNOSISDATE
                format %tdnn/dd/CCYY ECHODATE
                format %tdnn/dd/CCYY PACEDATE
                format %tdnn/dd/CCYY ECGDate
                rename *, lower
                
                //  MAKE THE VARIABLE NAMES SYSTEMATIC
                rename heartvol echoresult
                rename pacediag pacedresult
                rename abnlpace paceabnl
                rename abnlecg ecgabnl
                
                drop ttedate_dx ecgdatedx
                
                //  MAKE A LONG LAYOUT LIST OF ALL THE
                //  PROCEDURES AND ENCOUNTERS
                gen long obs_no = _n
                reshape long @date @result @abnl, i(obs_no) j(test) string
                drop if missing(date)
                
                //  FILL IN ALL COMBINATIONS OF TEST AND DATE FOR EACH ID
                preserve
                keep test date
                duplicates drop
                tempfile framework
                save `framework'
                restore
                merge m:1 test date using `framework', assert(match using) nogenerate
                drop obs_no
                duplicates drop
                
                //  SPREAD DIAGNOSIS DATE TO EVERY OBSERVATION OF THE ID
                by id, sort: egen diagnosis_date = max(cond(test == "diagnosis", date, .))
                format date diagnosis_date %tdnn/dd/CCYY
                drop if test == "diagnosis"
                
                //  NOW GO WIDE: ONE OBS PER ID#DATE
                reshape wide @result @abnl, i(id date) j(test) string
                
                //  CLEAN UP
                order diagnosis_date, after(id)
                drop ecgresult echoabnl paceresult
                //  REPLACE MISSING RESULTS OF A TEST BY
                //  CARRYING FORWARD THE LAST OBSERVATION
                foreach v of varlist ecgabnl echoresult paceabnl {
                    by id (date), sort: replace `v' = `v'[_n-1] if missing(`v')
                }
                gen days_from_diagnosis = date - diagnosis_date
                rename echoresult heartvolume

                Comment


                • #9
                  Thank you very much for all your help! I was trying to figure this out for days!! really appreciate it!

                  Comment

                  Working...
                  X